35 research outputs found

    On the Functional Test of Special Function Units in GPUs

    Get PDF
    The Graphics Processing Units (GPUs) usage has extended from graphic applications to others where their high computational power is exploited (e.g., to implement Artificial Intelligence algorithms). These complex applications usually need highly intensive computations based on floating-point transcendental functions. GPUs may efficiently compute these functions in hardware using ad hoc Special Function Units (SFUs). However, a permanent fault in such units could be very critical (e.g., in safety-critical automotive applications). Thus, test methodologies for SFUs are strictly required to achieve the target reliability and safety levels. In this work, we present a functional test method based on a Software-Based Self-Test (SBST) approach targeting the SFUs in GPUs. This method exploits different approaches to build a test program and applies several optimization strategies to exploit the GPU parallelism to speed up the test procedure and reduce the required memory. The effectiveness of this methodology was proven by resorting to an open-source GPU model (FlexGripPlus) compatible with NVIDIA GPUs. The experimental results show that the proposed technique achieves 90.75% of fault coverage and up to 94.26% of Testable Fault Coverage, reducing the required memory and test duration with respect to pseudorandom strategies proposed by other authors

    FPGA-based translation system from colombian sign language to text

    Get PDF
    This paper presents the development of a system aimed to facilitate the communication and interaction of people with severe hearing impairment with other people. The system employs artificial vision techniques to the recognition of static signs of Colombian Sign Language (LSC). The system has four stages: Image capture, preprocessing, feature extraction and recognition. The image is captured by a digital camera TRDB-D5M for Altera’s DE1 and DE2 development boards. In the preprocessing stage, the sign is extracted from the background of the image using the thresholding segmentation method; then, the segmented image is filtered using a morphological operation to remove the noise. The feature extraction stage is based on the creation of two vectors to characterize the shape of the hand used to make the sign. The recognition stage is made up a multilayer perceptron neural network (MLP), which functions as a classifier. The system was implemented in the Altera’s Cyclone II FPGA EP2C70F896C6 device and does not require the use of gloves or visual markers for its proper operation. The results show that the system is able to recognize all the 23 signs of the LSC with a recognition rate of 98.15 %

    Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

    Get PDF
    The most recent generations of graphics processing units (GPUs) boost the execution of convolutional operations required by machine learning applications by resorting to specialized and efficient in-chip accelerators (Tensor Core Units or TCUs) that operate on matrix multiplication tiles. Unfortunately, modern cutting-edge semiconductor technologies are increasingly prone to hardware defects, and the trend to highly stress TCUs during the execution of safety-critical and high-performance computing (HPC) applications increases the likelihood of TCUs producing different kinds of failures. In fact, the intrinsic resiliency to hardware faults of arithmetic units plays a crucial role in safety-critical applications using GPUs (e.g., in automotive, space, and autonomous robotics). Recently, new arithmetic formats have been proposed, particularly those suited to neural network execution. However, the reliability characterization of TCUs supporting different arithmetic formats was still lacking. In this work, we quantitatively assessed the impact of hardware faults in TCU structures while employing two distinct formats (floating-point and posit) and using two different configurations (16 and 32 bits) to represent real numbers. For the experimental evaluation, we resorted to an architectural description of a TCU core (PyOpenTCU) and performed 120 fault simulation campaigns, injecting around 200,000 faults per campaign and requiring around 32 days of computation. Our results demonstrate that the posit format of TCUs is less affected by faults than the floating-point one (by up to three orders of magnitude for 16 bits and up to twenty orders for 32 bits). We also identified the most sensible fault locations (i.e., those that produce the largest errors), thus paving the way to adopting smart hardening solutions

    Programmers manual FlexGripPlus SASS SM 1.0

    Get PDF
    This document describes the op-code of the assembly language SASS of the G80 architecture used in the FlexGripPlus model. Every instruction is compatible with the CUDA Programming environment under the SM_1.

    Evaluating the impact of Permanent Faults in a {GPU} running a Deep Neural Network

    Get PDF
    Currently, Deep Neural Networks (DNNs) are fun-damental computational structures deployed in a wide range of modern application domains (e.g., data analysis, healthcare, automotive, robotics). The computational complexity is inherent in these cognitive models, which demand high-performance devices like Graphics Processing Units (GPUs). Therefore, the implementation of DNNs on GPU devices is becoming increasingly frequent, even for cutting-edge safety-critical applications (e.g., autonomous and semi-autonomous cars). Thus, the reliability evaluation of these applications is mandatory because several phenomena (including aging) may produce permanent defects in the GPU, thus inducing the DNN to produce wrong results. Until now, the effects of permanent faults on DNNs have been mainly investigated at the application level, only, e.g., acting on the parameters of the network. This paper presents an environment allowing for the first time a more detailed experimental evaluation of the impact of permanent faults in a GPU on the reliability of a DNN running on it, based on considering faults at the architectural level. The results of the fault injection campaigns we performed on the GPU register files are compared with those at the application level, proving that the latter ones are generally optimistic

    A Multi-level Approach to Evaluate the Impact of GPU Permanent Faults on CNN's Reliability

    Get PDF
    Graphics processing units (GPUs) are widely used to accelerate Artificial Intelligence applications, such as those based on Convolutional Neural Networks (CNNs). Since in some domains in which CNNs are heavily employed (e.g., automotive and robotics) the expected lifetime of GPUs is over ten years, it is of paramount importance to study the impact of permanent faults (e.g. due to aging). Crucially, while the impact of transient faults on GPUs running CNNs has been widely studied, an accurate evaluation of the impact of permanent faults is still lacking. Performing this evaluation is challenging due to the complexity of GPU devices and the software implementing a CNN. In this work, we propose a methodology that combines the accuracy of gate-level fault simulation with the speed and flexibility of software fault injection to evaluate the effects of permanent hardware faults affecting a GPU. First, we profile the executed low-level GPU instructions during the CNN inference. Then, using extensive gate-level fault injection campaigns, we provide an accurate analysis of the effects of permanent faults on the internal modules executing the targeted instructions. Finally, we propagate these effects using fast software-based fault injection. The method allows, for the first time, to estimate the percentage of permanent faults leading the CNN to produce wrong results (i.e., changing the result of its work). The method's feasibility, which allows for flexibly trade-off accuracy with the required computational effort, is shown using LeNet running on an Ampere Nvidia GPU as a case study. The method reduces the computational effort for the evaluation by several orders of magnitude with respect to plain gate- and RTL-level faults simulation

    Characterizing a Neutron-Induced Fault Model for Deep Neural Networks

    Get PDF
    International audienceThe reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results since it has limited access to the hardware resources and the adopted fault models may be too naive (i.e., single and double bit flip). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This paper proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at software level. We exposed GPUs running General Matrix Multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61%, and show that ECC is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from RTL simulations. Our results show that by injecting complex fault models, the YOLOv3 misdetection rate is validated to be very close to the rate measured with beam experiments, which is 8.66Ă— higher than the one measured with fault injection using only single-bit flips

    PFS - Reliability Assessment of Neural Networks in GPUs

    No full text
    Currently, Deep learning and especially Convolutional Neural Networks (CNNs) have become a fundamental computational approach applied in a wide range of domains, including some safety-critical applications (e.g., automotive, robotics, and healthcare equipment). Therefore, the reliability evaluation of those computational systems is mandatory. The reliability evaluation of CNNs is performed by fault injection campaigns at different levels of abstraction, from the application level down to the hardware level. Many works have focused their effort on evaluating the reliability of neural networks in the presence of transient faults. However, the effects of permanent faults have been investigated at the application level, only, e.g., targeting the parameters of the network. This paper presents the ongoing work on the reliability evaluation of CNNs targeting permanent faults in GPU devices, considering different fault injections levels. Our preliminary results show that the fault injections performed at the application level generate more optimistic results than considering an architectural level fault injection

    Sistema traductor de la lengua de señas colombiana a texto basado en FPGA

    No full text
    Este trabajo presenta el desarrollo de un sistema diseñado para facilitar la comunicación e interacción de personas con discap acidad auditiva severa con las demás personas. El sistema emplea técni cas de visión artificial para el reconocimiento de las señas es táticas de la Lengua de Señas Colombiana (LSC). El sistema tiene cuatro etapa s: Captura de la imagen, preprocesamiento, extracción de características y reconocimiento. La imagen es capturada median te una cámara digital TRDBD5M diseñada para tarjetas de desarr ollo de DE1 y DE2 Altera. En la etapa de preprocesamiento, la seña es extraída del fondo de la imagen mediante el método de segmen tación por umbral; posteriormente, la ima gen segmentada es filtrada us ando una operación morfológica para eliminar el ruido. La etapa de extracción de características está basada en la creación de dos vectores que caracterizan la forma de la mano mediante la que se realiza la seña. La etapa de reconocimiento está constituida por una red n euronal artificial perceptrón multicapa (MLP), la cual actúa co mo clasificador. El sistema fue imp lementado en el dispositivo FPG A Cyclone II EP2C70F896C6 y no requiere el uso de guantes o marcadores visuales para su correcto funcionamiento. Los result ados muestran que el sistema tiene la capacidad para reconocer todas las 23 señas estáticas de la LSC con una taza de reconocimiento del 98.15 %
    corecore